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The mission of the Center for Research on Elementary and Middle Schools is to 
produce useful knowledge about how elementary and middle schools can foster 
growth in students' learning and development^ to develop and evaluate practical 
methods for improving the effectiveness of elementary and middle schools based 
on existing and new research findings^ and to develop and evaluate specific 
strategies to help schools implement effective research^based school and 
classroom practices. 

The Center cond :ts its research in three program areas: (1) Elementary 
Schools^ (2) Mid^^e Schools* and (3) School Improvement, 

The Elementary School Program 

This program workc from a strong existing research base to develop* evaluate* 
and disseminate effective elementary school and classroom practices; synthesizes 
current knowledge; and analizes survey and descriptive data to expand the 
knowledge base in effective elementary education. 

The Middle School Program 

This program's research links current knowledge about early adolescence as a 
stage of human development to school organization and classroom policies and 
practices for effective middle schools. The major task is to establish a 
research base to identify specific problem areas and promising practices in 
middle schools that will contribute to effective policy decisions and the 
development of effective school and classroom practices. 

School Improvement Program 

This program focuses on improving the organizational performance of schools 
in adopting and adapting innovations and developing school capacity for change. 



This report* prepared by the Elementary School Program* synthesizes research 
on the effects of mastery learning on student achievement. 
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Mastery Leami:ig Reconsidered 
Abstract 



Several recent reviews and meta-analyses have claimed extraordinarily 
positive effects of mastery learning on student achievement, and Blocn 
(1984a. b) has hypot jsized that mastery-based treatments will soon be able 
to produce "two-sig la" (i.e., two standard deviation) increases in achieve- 
ment. This article examines the literature on achievement effects of prac- 
tical applications of group-based mastery learning in elementary and secot^ 
dary schools over periods of at least four weeks, using a review technique, 
"best-evidence synthesis, " which combines features of meta-analytic and 
traditional narrative reviews. The review found essentially no evidence to 
support the effectiveness of ^roup-based mastery learning on standardized 
achievement measures. On experimenter-made measures, effects were generally 
positive but moderate in magnitude, with little evidence that effects main- 
tained over time. These results are discussed in light of the coverage vs. 
mastery dilemma posed by group-based mastery learning. 
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M«attry LMrning R«contid#r«d 



The ten "mastery learning" refers to a large and diverse cat?5gory of 
instructional methods. The principal defining characteristic of mastery 
learning methods is the establishment of a criterion level of performance 
held to represent "mastery" of a given skill or concept, frequent assessment 
of student progress toward the mastery criterion, and provision of correc- 
tive instruction to enable students who do not initially meet the mastery 
criterion to do so on later parallel assessments (see Bloom, 1976; Block & 
Anderson, 1975). Bloom (1976) also includes an emphasis on appropriate use 
of such instructional variables as cues, participation, feedback, and rein- 
forcement as elements of mastery learning, but these are net uniquely defin- 
ing characteristics: rather, what defines mastery learning approaches is the 
organization of time and resources to ensure that most students are able to 
master instructional objectives. 

There are three primary forms of mastery learning. One, called the Per- 
sonalized System of Instruction (PSI) or the Keller Plan (Keller, 1968), is 
used primarily at the post-secondary level. In this form of mastery learn- 
ing, unit objectives are established for course of study and tests are 
developed for each. Students may take the test (or parallel lorms of it) as 
many times as they wish until they achieve a passing score. To do this, 
students typically work on self- instructional materials and/or work with 
peers to learn the course content, and teachers may give lectures more to 
supplement than to guide the learning process (see Kulik, Kulik, and Cohen, 



1979). A related form of mastery learning is continuous progress (e.g., 
Cohen, 1977), iffaare students work on individualized units entirely at their 
own rate. Continuous progress mastery learning programs differ from other 
individualized models only in that they establish mastery criteria for unit 
tests and provide corrective activities to students who do not meet these 
criteria the first time. 

The third form of mastery learning is called group-based mastery learn- 
in£, or Learning for Mastery (LFM) (Block & Anderson, 1975). This is by far 
the most commonly used form of mastery learning in elementary and secondary 
schools. In group-based mastery learning the teacher instructs the entire 
class at one pace. At the end of each unit of instruction a "formative 
test" is given, covering the unit's content. A mastery criterion, usually 
in the range of 80-90X correct, is established for this test. Any students 
who do not achieve the mastery criterion on the formative test receive cor- 
rective instruction, which may take the form of tutoring by the teacher or 
by students who did achieve at the criterion level, small group sessions in 
which teachers go over skills or concepts students missed, alternative 
activities or materials for students to complete independently, and so on. 
In describing this form of mastery learning. Block and Anderson (1975) 
recommend that corrective activities oe different from the kinds of activi- 
ties used in initial instruction. Following the corrective instruction, 
students take a parallel formative or "summative" test. In sane cases only 
one cycle of formative test-corrective instruction-parallel test is used, 
and the class moves on even if several students still have not achieved the 
mastery criterion; in others, the cycle may be repeated two or more times 
until virtually all students have gotten a passing score. All students who 
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achieve the masterj* criterion at any point are generally given en "A" on the 
unit, regardless cf how many tries it took for them to reach the criterion 
score. 

The most recent full-scale review of research on mastery learning was 
published more than a decade ago, hy Block and Burns (1976) . However, in 
recent years two meta-analyses of research in this area have rppeared, one 
by Kulik, Kulik, ana Bangert-Drowns (1986) and one by Guskey and Gates 
(1985, 1986). Meta-analyses characterize the impact of a treatment on a set 
of related outcomes using r common metric called "effect size," the posttest 
score for the experimental group minus that for the control group divided by 
the control group' f standard deviation (see Glass, McGaw, and Smith, 1981). 
For example, an effect size of 1.0 would indicate that on the average, an 
experimental group exceeded a control group by one standard deviation; the 
average member of the experimental group would score at the level of a stu- 
dent in the 84th percentile of the control group's distribution. 

Both of the recent meta-analyses of research on mastery learning report 
extraordindcy positive effects of this method on student achievement. Kulik 
et al. (1986) find mean effect sizes of 0.52 for pre--college studies and 
0.54 for collego studies. Guskey and Gates (1985) claim effect sizes of 
0.94 at the elementary level (grades 1-8), 0.72 ac the high school level, 
and 0.65 at the college level. Further, Walberg (1984) reports a mean 
effect size of 0.81 for "science mastery learning" and Lysakowski and Wal- 
berg (1982) estimate an effect size for "cues, participation, and corrective 
feedback," principal components of mastery learning, at 0.97. Bloom (1984, 
p. 7) claims an effect size of 1.00 "when mastery learning procedures are 
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done systematically and well," and has predicted that foras of mastery 
learning will b« able to consistently produce achievecsnt effects of •'two 
Sigma" (i.e., effect sizes of 2 •00), To put these effect sizes in perspec- 
tive, con5ider that the mean effect size for randomised studies of one-to- 
one adult tutoring reported by Glass, Cahen, Smith, and Filby (1982) was 
0.62 (see Slavin, 1984), If the effects of mastery learning instruction 
approach or exceed those for one-to-one tutoring, then mastery learning is 
indeed a highly effective instructional method. 

The purpose of the present article is to review the research on the 
effec s of group-based mastery learning on the achievcjment of elementary and 
secondary students in an attempt to understand the validity and the practi- 
cal implications of these findings. The review uses a method for synthesiz- 
ing large literatures called "best-evidence synthesis" (Slavin, 1986), which 
combines the use of effect size as a common metric of treatment effect with 
narrative review procedures. Before synthesizing the "best evidence" on 
practical applications of mastery learning, the following sections discuss 
the theory on which group-based mastery learning is based, how that theory 
is interpreted in practice, and problems inherent to research on the 
achievement effects of mastery learning. 

Mastery Learning in Theory and Practice 

The theory on which mastery- learning is based is quite compelling. Par- 
ticularly in such hierarchically organized subjects as mathematics, reading, 
and foreign language, failure to learn prerequisite skills is likely to 
interfere with students* learning of later skills. For example, if a stu- 
dent fails to learn to subtract, he or she is sure to fail in learning long 



division. If instruction is directed toward ensuring that nearly all stu- 
dents learn each skill in a hierarchical sequence, then students will have 
the prerequisite skills necessary to enable them to learn the later skills. 
Rather than accepting the idep that differences in student aptitudes will 
lead to corresponding differences in student achievement, mastery learning 
theory holds that instructional time and resources should be used to bring 
all students up to an acceptable level of achievement. Put another way, 
mastery learning theorists suggest that rather than holding instructional 
time constant and allowing achievement to vary (as in traditional instruc- 
tion), achievement level should be held constant and time allowtid to vary 
(see Bloom, 1968; Carroll, 1963). 

In an extreme form, the central contentions of mastery learning theory 
are almost tautologically true. If we establish a reasonable set of learn- 
ing objectives and demand that every student achieve them at a high level 
regardless of how long that takes, then it is virtually certain that all 
students will ultimately achieve that criterion. For example, imagine that 
students are learning to subtract two-digit numbers with renaming. A 
teacher mx^ht set a mastery criterion of 80% on a test of two-digit subtrac- 
tion. After some period of instruction, the class is given a formative 
test, and let's say half of the class achieves at the 80% level. The 
teacher might then work with the "non-masters" group for one or more per- 
iods, and then give a parallel test. Say that half rf the remaining stu- 
dents pass this time (25% of the class). If the teacher continues this 
cycle indefinitely, then all or almost all students will ultimately learn 
the skill, although it may take a long time for this to occur. Such a 
procedure would also accomplish two central goals of mastery learning, par- 
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ticularly as explicated by Blooo (1976); To reduce the variation in student 
achievement and to reduce or eliminate any correlation between aptitude and 
achievement. Since all students must achi<*ve at a high level on the sub- 
traction ot^ 've but students who achieve t' criterion early cannot go on 
to new material, there is a ceiling effect built in to the procedure which 
will inherently cause variation among students to be small and correspond- 
ingly reduce the correlation between mathematics aptitude and subtraction 
performance. In fact, if we set the mastery criterion at 100% and repeated 
the fonni.tive test-corrective instruction cycle until all students achieved 
this criterion, then the variance on the subtraction test would be zero, as 
would the correlation between aptitude and achievement. 

However, this begs several critical questions. If some students take 
much longer than others to learn a particular objective, then one of two 
things must happen. Either corrective instruction must be given outside of 
regular class time, or students who acnieve mastery early on will have to 
waste considerable amounts of time waiting for their classmates to catch up. 
The first option, extra time, is expensive and difficult to arrange, as it 
requires that teachers be available outside of class time to work with the 
non-masters and that seme students spend a great deal more time on any par- 
ticular subject than they do ordinarily. The other option, putting rapid 
masters on hold with "enrichment" or "lateral extension" activities while 
corrective instruction is given, is unlikely to be beneficial for these stu- 
dents. For all students mastery learning poses a dilemma, a choice between 
content coverage and content mastery (see Arlin, 1984a: Mueller, 1976; Res- 
nick, 1977). It may often be the case that even for low achievers, spending 
th^ time to master each objective may be less productive than covering more 
objectives (see, for example, Cooley , Leinhardt, 1980). 
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Problems Inherent to Mas tery Learning Research 

The nature of mastery learning theory and practice creates thorny prob- 
lems for research on the achievement effects of mastery learning strategies. 
These problems fall into two principal categories: Unequal time and unequal 
obj ectives. 

Unequal time. One of the fundamental propositions of mastery learning 
theory is that learning should be held constant and time should be allowed 
to vary, rather than the opposite situation held to exist in traditional 
instruction. However, if the total instructional time allocated to a parti- 
cular subject is fixed, then a common level of learning for all students 
could only be achieved by taking time away from high achievers to increase 
it for low achievers, a leveling process that would in its extreme form be 
repugnant to most educators (see Arlin, 1982, 1984b; Arlin & Westbury, 1976; 
Fitzpatrick, 1985; Smith, 1981). 

To avoid what Arlin (1984) calls a "Robin Hood" approach t' time alloca- 
tion in mastery learning, many applications of mastery learning provide cor- 
rective instruction during times other than regular class time, such as dur- 
ing lunch, recess, or after school (see Arlin, 1982). In short-term 
laboratory studies, the extra time given to students who need corrective 
instruction is often substantial. For example, Arlin & Webster (1983) con- 
ducted an experiment in which students studied a unit on sailing under mas- 
tery or non-mastery conditions for four days. After taking formative tests, 
mastery learning students who did not achieve a score of 80% received indi- 
vidual tutoring during times other than regular class time. Non-mastery 
students took the formative tests as final quizzes, and did not receive 
tutoring. 



The mastery learning students achieved at twice the level of non-mastery 
students in terms of percent correct on daily chapter test8» an effect size 
of more than 3.O. However^ mastery learning students spent more than twice 
as much time learning the same material* On a retention test taken four 
days after the last lesson, mastery students retained more than non-mastery 
students (effect size .70). However, non-mastery students retained far 
more per hour of instruction than did mastery learning students (ES = 
-1.17). 

In recent articles published in Educational Leadership and the Educa- 
tional Researcher , Benjamin Bloom (1984a, b) noted that several disserta- 
tions done >y his graduate students at the University of Chicago found 
effect sizes for mastery learning of one sigma or more (i.e., one standard 
deviation or more above the control group's mean). In all of these, correc- 
tive instruction was given outside of regular class time, increasing total 
instructional time beyond that allocated to the control groups. The addi- 
tional time averaged 20-33% of the initial classroom instruction, or about 
one day per week. For example, in a two-week study in Malaysia by Nordin 
(1980) an extra period for corrective instruction was provided to the mas- 
tery learning classes, while control classes did other school work unrelated 
to the units involved in the study. A three-week study by Anania (1981) set 
aside one period each week for corrective instruction. In a study by Leyton 
(1983), students received 2-3 periods of corrective ir cruction for every 
2-3 veeks oJ initial instruction. All of the University of Chicago disser- 
tations cited by )om (l984 a, b) provided the mastery learning classes 
with similar amounts of additional instruction (Burke, 1983; Levin, 1979; 
Mevarech, 1980; Tenenbaum, 1982). 
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In discussing t\f practicality of mastery learning, Bloon (1984 a, p, 9) 
states that the time or other costs of the mastery learning procedures 

have usually been very small.'* It may be true that school districts could 
in theory provide tutors to administer corrective instruction outside of 
regular class time; the costs of doing so would hardly be 'Very small." but 
cost or cost-effectiveness is not at issue here. But as a question of 
experimental design, the extra time often given to mastery learning classes 
is a serious problem. It iz virtually unheard-of in educational research 
'Hjftti'itt »»t thfe liittbibf/ JedHiing tfo'inii.ii to systafidi 2 ral ly .ovid^ an 
exper impntal group with mfre instructional time than a control group; presu- 
mably, any sensible instructional program would produce significantly 
greater achievement than a control method which involved 20-33% less 
inst r uct i onal t ime . 

Tt might be argued that mastery learning programs which provide correc- 
tive instruction outside of regular class time produce effects which are 
substantially greater £er unit time than those associated with traditional 
instruction. However, computing ''learning per unit lime" is not a straight- 
forward process. In the Arlin and Webster (1983) experiment discussed ear- 
lier, mastery learning students passed about twice as many items on immedi- 
ate chapter tests as did control students, and the time allocated to the 
mastery learning students was twice that allocated to control. Thus, the 
"learning per unit time" was about equal in both groups. Yet on a retention 
test only four .ys later, the items passed per unit time were considerably 
higher for the control group. Which is the correct measure of learning per 
unit time, that associated with the chapter tests or that associated with 
the retention test? 



Many mastery learning theorists (e.g.. Block* 1972; Bloom, 1976; Guskey. 
1985) have argued that the "extra time" issue is not as problematic as it 
seemsp because the time needed for corrective instruction should diminish 
over time. The theory behind this is that by ensuring that all students 
have mastered the prerequisite skills for each new unit, the need for cor- 
rective instruction on each successive unit should be reduced. A few brief 
experiments using specially constructed, hierarchically organized curriculum 
materials have demonstrated that over as many as three successive units, 
time needed for corrective instruction does in fact diminish (Anderson, 
1976; Arlin, 1973; Block, 1972). However, Arlin (1984) examined time-to- 
mastery records for students involved in a mastery learning program over a 
four^year period* In the first grade, the ratio of average time to mastery 
for the slowest 25% of students to that for the fastest 25Z was 2.5 to 1. 
Rather than decreasing, as would have been predicted by mastery learning 
theorists, this ratio increased over the four year period. By the fourth 
grade, the ratio was 4.2 to 1. Thus, while it is theoretically possible 
that mastery learning procedures may ultimately reduce the need for correc- 
tive instruction, no evidence from long-term practical applications of mas- 
tery learning supports this possibility at present. 

It should be noted that many studies of mastery learning do hold total 
instruction time more or less constant across experimental and control con^ 
ditions. In discussing the "best evidence" on practical applications of 
mastery learning, issues of time for corrective instruction will be explored 
further. 
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Unequal obj ectives . An even thornier problan posed by research on mas-- 
tery learning revolves around the question of achievement measures used as 
dependent variables. Most studies of mastery learning use experimenter-made 
summative achievement tests as the criterion of learning effects. The dan- 
ger inherent in the use of such tests is that they w.ill correspond more 
closely to the curriculum taught in the mastery learning classes than to 
that taught in control classes. Seme articles describing mastery learning 
experiments (e.g. » Kersh, 1970; Lueckemeyer & Chiapp<>tta, 1981) describe 
considerable efforts to ensure that experimental and control classes were 
pursuing the same objectives, and many studies administer the formative 
tests used in the mastery learning classes as quizzes in the control 
classes, which in theory should help focus the control classes on the same 
objectives. On the other hand, many other studies specified that students 
used the same texts and other materials but did net use formative tests in 
the control group or otherwise focus the control groups on the same objec- 
tives as those pursued in the mastery learning classes (e.g., Cabezon, 1984; 
Crotty, 1975). 

The possibility that experimenter-made tests will be biased toward the 
objectives taught in experimental groups exists in all educational research 
which uses such tests, but it is parti'rularly problematic in research on 
mastery learning, which by its nature focuses teachers and students on a 
narrow and explicitly defined set of objectives. When careful control of 
instruction methods, materials, and tests is not exercised, there is always 
a possibility that the control group is learning valuable information or 
skills not learned in the mastery learning group but not assessed on the 
experimenter-made measure. 



Even when instructional objectives are carefully matched in experimental 
and control classes, use of experimenter-made tests keyed to what is taught 
in both classes can introduce a bias in favor of the mastery learning treat- 
ment* n» noted earlier, when time for corrective instruction is provided 
within regular class time (rather than after class or after school), mastery 
learning trades coverage for mastery. The overall effects of this trade 
must be assessed using broadly based measures. What traditional whole-class 
instruction is best at, at least in theory, is covering material. Mastery 
learning proponents point out that material covered is not necessarily 
material learned. This is certainly true, but it is just as certainly true 
that material not covered is material not learned. Holding mastery learning 
and control groups to the same objectives in effect finesses the issue of 
instructional pace by only measuring the objectives that are covered by the 
mastery learning classes. If the control classes in fact cover more objec- 
tives, or could have done so had they not been held to the same pace as the 
mastery learning classes, this would not be registered on the experimenter^ 
made test. 

Two studies clearly illustrate the problems inherent in the use of exper- 
imenter-made tests to evaluate mastery learning. One is a year^long study 
of mastery learning in grades 1-6 by Anderson, Scott, and Hutlock (1976), 
which is described in detail later in this review. On experimenter^made 
math tests the mastery learning classes significantly exceeded control at 
every grade level (mean effect size = +.64). On a retention test adminis- 
tered three months later the experimental-control differences were still 
substantial (ES = +.49). However, the experimenters also used the mathemat- 
ics scales from the standardized California Achievement Test as a dependent 




12- 



variabltt. On thla iMt tht •xptriroental-control differencta were effec- 
tively zero (ES = +.04) • 

A study by Taylor (1973) in ninth grade algebra classes ~ although not 
strictly speaking a study of mastery learning — nevertheless illustrates 
the dilemma involved in the use of experimenter-made tests in evaluation of 
mastery learning programs. At the beginning of the semester^ students in 
the experimental classes were each given a copy of a '•minimal essential 
skills" test» and were told that to pass the course they Vould need to 
obtain a score of at least 80Z on a parallel form of the test. About three 
weeks before the end of the semes ter» another parallel form of the final 
test was administered to students, and the final three weeks was spent on 
remedial work and retesting for students who needed it (while other students 
worked on enrichment activities). At the end of the semester the final test 
was given. A similar procedure was followed for the second semester. 

Experimenter-made as well as stanc'ardized measures were used to assess 
the achievement effects of the program. On the minimum essential skills 
section of the experimenter-made test, scores averaged 87.3% correct, dra- 
matically higher than they had been on the same test in the same schools the 
previous year (55.4%). On a section of the experimenter-made test covering 
skills "beyond, but closely related to, minimimi essentials," differences 
favoring the experimental classes were still substantial, 44.6% correct vs. 
29.2%. Differences on the minimum essentials subtest of the standardized 
Cooperative Algebra Test also favored the experimental group (ES = +.47). 
However, on the section of the standardized test covering skills beyond 
minimum essentials, the control group exceeded the experimental group (ES = 
-.25). 



The Taylor (1973) intervention does not qualify as mastery learning 
because it involved only one feedback-corrective instruction cycle per sem- 
ester. However, the study demonstrates a problem characteristic of mastery 
learning studies which use experimenter-made tests as dependent measures. 
Had Taylor only used the experimenter-made test, his study would have 
appeared to provide overwhelming support for the experimental procedures. 
However, the results for the standardized tests indicated that students in 
the control group (the previous year) were learning materials that did not 
appear on the experimenter-made tests. The attention and efforts of teach- 
ers as well as students were focused on a narrow set of instructional objec- 
tives which constituted only about 30% of the items on the broader-based 
standardized me£8ure. 

These observations concerning problems in the use of experimenter-made 
measures do not imply that all studies which use then should be ignored. 
Rather, they are meant to suggest extreme caution and careful reading of 
details of each such study before conclusions are drawn. 



Methods 



This review uses a method called "best-evidence synthesis." procedures 
described by Slavin (1986> for synthesizing large literatures in social sci- 
ence. This section, "MethodSp" outlines the specific procedures used in 
preparing the reviewp including such issues as how studies were located, 
which were selected for inclusioHp how effect sizes were computed, how stu- 
dies were categorized, and how the question of pooling of effect sizes was 
handled. 

Literature Search Procedures 

The first step in conducting the best-evidence synthesis was to locate as 
complete as possible a set of studies of mastery learning. Severed sources 
of references were used. The ERIC system and Dissertation Abstracts pro- 
duced hundreds of citations in response to the keywords "mastery learning." 
Additional sources of citations included a bibliography of mastery learning 
studies compiled by Hymel (1982), earlier reviews and meta-analyses on mas- 
tery learning, and references in the primary studies. Papers presented at 
the American Educational Research Association meetings since 1976 were soli- 
cited from their authors. Dissertations were ordered from University Micro- 
films and from the University of Chicago, which does not cooperate with 
University Microfilms. 

Criteria for Study Inclusion 

The studies on which this review is primarily based had to meet a set of 
a priori criteria with respect to germaneness and methodological adequacy. 



Germaneneas. To be considered germane to the review, all studies had to 
evaluate group-based mastery learning programs in regular (i.e., non-spe- 
cial) elementary and secondary classrooms. "Group-baced mastery learning** 
was defined as any instructional method which had the following characteris- 
tics: 

1. Students were tested on their mastery of instructional objectives at 
least once each month. A mastery criterion was set (e.g., 80% correct) 
and students who did not achieve this criterion on an initial formative 
test received corrective instruction and a second formative or summative 
test. This cycle could be repeated one or more times. Studies wero 
included regardless of the form of corrective instruction used and 
regardless of whether corrective instruction was given during or outside 
of regular class time. 

2. Before each formative test, students were taught as a total group. This 
requirenent excluded studies of individualized or continuous progress 
forms of mastery learning and studies of the Personalized System of 
Instruction. However, studies in which mastery learning students worked 
on individualized materials as corrective (not initial) instruction were 
included. 

3. Ma^ <*ry learning was the only or principal intervention. This excluded 
comparisons such as those in two studies by Mevarech (1985a, b) evaluat- 
ing a combination of mastery learning and cooperative leamxng, and com- 
parisons involving enhancement of cognitive entry behaviors (e.g.. Ley- 
ton, 1983). 
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Studies evaluating programs similar to mastery learning but conducted 
before Blooo (1968) described it were excluded (e.g.» Rankin* Anderson* & 
Bergpan* 1936). Other than this* no restrictions were placed on sources or 
types of publications* Every attempt was made to locate dissertations* ERIC 
documents* and conference papers as well as published materials* 

Methodological Adequacy , Criteria for methodological adequacy were as 
follows* 

1. Studies had to compare group-based mastery learning programs to trad- 
itional group-paced instruction not using the feedback-corrective cycle, A 
small number of studies (e.g.* Katims & Jones* 1985; Levine & Stark* 1982; 
Strassler & Rochester* 1982) which compared achievement under mastery learn* 
ing to that during previous years (before mastery learning was introduced) 
were excluded* on the besis that changes in grade-to-grade promotion poli- 
cies* curriculum alignment* and other trends in recent years make year-to- 
year changes difficult to ascribe to any one factor. 

2. Evidence had to be given that experimental and control groups were 
initially equivalent* or the degree of non-equivalence had to be quantified 
and capable of being adjusted for in computing effect sizes. This excluded 
a small number of studies which failed to either give pretests or to ran- 
domly assign students to treatments (e.g.* Dillashaw & Okey, 1983). 

3. Study duration had to be at least four weeks (20 hours). This res- 
triction excluded a large i mber of biief* often quite artificial experi- 
ments. The reason for it was to concentrate the review on mastery learning 
procedures that could in principle be used over extended time periods; 



proceduies which teachers might be able to sustain for a week or two but not 
longer were thus excluded. One four-week study by Strasler (1979) was 
excluded on the basis that it was really two two-week studies on two com- 
pletely unrelated topics, ecology and geometry. The four-week requirenent 
caused by far the largest amount of exclusion of studies included in previ- 
ous reviews and meta-analyses. For example, of 25 elementary and secondary 
achievement studies cited by Guskey and Gates (1985), eleven (with a median 
duration of one week) were excluded by this requirement. 

A. At least two experimental and two control classes and/or teachers had 
to be involved in the study. This excluded a few studies (e.g., Collins, 
1971; Ley ton, 1983; Long, Okey, & Yeany, 1981; Mevarech, 1985a; Tenenbaum, 
1982) in which treatment effects were completely confounded with teacher/ 
class effects. Also excluded were a few studies in which several teachers 
were involved but each taught a different subject (Guskey, 1982, 198A; Rubo- 
vits, 1975). Because it would be inappropriate to compute effect sizes 
across the different subjects, these studies were seen as a set of two-class 
comparisons, each of which confounded teacher and class effects with treat- 
ment effects. 

5. The achievement measure used had to be an assessment of objectives 
taught in control as well as experimental classes. This requirement was 
liberally interpreted, and excluded only one study, a dissertation by Froe- 
mel (1980) in which the mastery learning classes' summative tests were used 
as the criterion of treatment effects and no apparent attempt was made to 
see that the control classes were pursuing the same objectives. In cases in 
which it was unclear to what degree control classes were held to the same 



objectives as experimental classee and experimenter-made measures were used, 
the studies were included. These studies are identified and discussed later 
in this review* and their results should be interpreted with a great deal of 
caution. 

Also excluded were studies which used grades as the only dependent mea- 
sures (e.g., Mathews, 1982; Wortham, 1980). In group-based mastery learn- 
ing, grades are increased as part of the treatment, as students have oppor- 
tunities to take tests over to try to improve their scores. They are thus 
not appropriate as measures of the achievement effects of the program. 
Similarly, studies which used time on-task as the only dependent measure 
were excluded (e.g., Fitzpatrick, 1985). 

Computation of Effect Sizes 

The size and direction of effectj of mastery learning on student achieve- 
ment are presented throughout this review in terms of effect size. Effect 
size, as described by Glass et al. (1981), is the difference between experi- 
mental and control posttest means divided by the control group's posttest 
standard deviation. However, this formula was adapted in the present review 
to take into account pretest or ability differences between the experimental 
and control groups. If pretests were available, then the formula used was 
the difference in experimental and control gains divided by the control 
group's posttest standard deviation. If ability measures rather than pre- 
tests were presented, then the experimental-control difference on these mea* 
sures (divided by the control group's standard deviation) was subtracted 
from the posttest effect size. The reason for these adjustments is that in 
studies of achievement posttest scores are so dependent on pretest levels 



that any pretest differences are likely to be reflected in post tests, cor- 
rMpondingly inflating or deflating effect sizea computed on postteats 
alone. 

Because itidividual-level standard deviations are usually of concern in 
mastery learning research, most studies which met other criteria tor inclu- 
sion presented data sufficient for direct computation of effect sii^e. In 
many studies, data analyses used class means and standard deviations, but 
individual-level standard deviations were also presented. Ir every case the 
individual-level standard deviations were used to compute effect sizes; 
class-level standard deviations are usually much smaller than individual- 
level SD's, inflating effect size estimates. Also, note that the control 
group standard deviation, not a pooled standard deviation, was always used, 
as mastery learning often has the effect of reducing achievement standard 
deviations. 

In the few cases in which data necessary for computing effect sizes were 
lacking in studies which otherwise met criteria for inclusion, the studies' 
results „ere indicated in terms of their direction and statistical signifi- 
cance. 
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tery learning and control classes are held to precisely the same objectives 
but the control classes are not allowed to move ahead if they finish those 
objectives before their mastery learning counterparts do, then the tradi- 
tional model is deprived of its natural advantage, the capacity to cover 
material rapidly. A "fair" measure of student achievement in a mastery 
learning experiment would have to register both coverage and mastery, so 
that if the control group covered more objectives than the mastery learning 
group its learning of these additional objectives would be registered. The 
"strong claim" would hold that even allowing control classes to proceed at 
their own rate and even using such a "fair" achievement measure, mastery 
learning would produce more achievement than control methods. 

The best evidence for the '^strong claim" would probably come from studies 
in which mastery learning and control classes studied precisely the same 
objectives using the same materials and lessons and the same amount of allo- 
cated time, but in which teachers could determine their own pace of instruc- 
tion and achievement iLeasures covered the objectives reached by the fastest-- 
moving class* Unfortunately, such stuaies are not known to exist* However, 
a good approximation of these experimental design features is achieved by 
studies which hold allocated time constant and use standardized tests as the 
criterion of achievement. Assuming that curriculum materials are not spe- 
cifically keyed to the standardized tests in either treatment, these tests 
offer a means of registering both mastery and coverage. In such basic 
skills areas as mathematics and reading, the standardized tests are likely 
to have a high overlap with the objectives pursued by mastery learning 
teachers as well as by control teachers. 
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Research on Achievement Effects of Group-Based Mastery Learning 

What are the effects of group-based mastery learning on the achievement 
of elementary and secondary students? In essence^ there are three claims 
that proponents of mastery learning might make for the effectiveness of mas- 
tery learning. These are as follows: 

1. Mastery learning is more effective than traditional instruction 
even when instructional time is held constant and fair achievement 
measures are used. 

This might be called the "strong claim" for the achievement effects of 
mastery learning. It is clear» at least in theory, that if mastery learning 
procedures greatly increase allocated time for instruction by providing 
enough additional time for corrective instruction to bring all students to a 
high level of mastery, then mastery learning students will achieve more than 
traditionally taught control students. But it is less obviously true that 
the additional time for corrective instruction is more productive in terms 
of student achievement than it would be to simply increase allocated time 
Tor the control students. The "strong claim" asserts that time used for 
corrective instruction (along with the other elements of mastery learning) 
is indeed more productive than time used for additional instruction to the 
class as a whole. 

Similarly, it is clear (in theory) that if students who e^erienc^d mas- 
tery learning are tested on the specific objectives they studied, they will 
score higher on those objectives than will students who were studying sini- 
lar but not identical objectives. Further, it is likely that even if mas- 
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2. Mastery learning is an effective means of ensuring that teachers 
adhere tc a particular curriculum and students learn a specific set 
of objectives (the "curric- lar focus" claim). 

A *Veak claim" for the effectiveness of mastery learning would be that 
these methods focus teachers on a partici-lar set of objectives which is held 
to be superior to those which might have been pursued by teachers on their 
own. This might be called the "curricular focus" claim* For example, con- 
sider a survey course on U.S. history. Left to their own devices, some 
teachers might teach details about xndividual battles of the Civil War; oth- 
ers might entirely ignore the battles and focus on the econonic and politi* 
cal issues; and still others might approach the topic in sane third way, 
combine both approaches, or even teach with wo particular plan of action. A 
panel of curriculum experts might determine that there is a small set of 
critical understandings about the Civil War that all students should have, 
and they might devise a criterion-referenced test to assess these under^ 
standings. If it can be assumed that the experts' judgments are indeed 
superior to those of individual teachers, then teaching to this test may not 
be inappropriate, and mastery learning may be a means of holding students 
and teachers to th^ essentials, relegating other concepts they might have 
learned (which are not on the criterion-tef erencvid test) to a marginal sta- 
tus. It is no accident that mastery learning grew out of the behavioral 
objectives/criterion-referenced testing movement (see BIood, Hsstings, & 
Madaus, 1971); one of the central precepts of mastery learning is that once 
critical objectives are identified for a given course, then students should 
be required to master those and only those objectives. Further, it is 
interesting to note that in recent years the mastery learning movement has 
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often allied itself vith the "curriculum alignment" movement, which seeks to 
focus teachers on objectives that happen to be contained in district- and/or 
state-level criterion-referenced minimum competency tests as well as norm- 
referenced standardized tests (see Levine, 1985). 

The "curricular focus" claim, that mastery learning may help focus teach- 
ers and students on certain objectives, is characterized here as a "veak 
claim" because U requires a belief that the objectives pursued by the mas- 
tery learning program represent the totality of the subject at hand, and 
that all other (unmeasured) objectives are essentially worthless. Critics 
(e.g., Resnick, 1977) point out with some justification that a focus on a 
well-defined set of minimum objectives may place a restriction on the maxi- 
mum that students might have achieved. However, in certain circumstances it 
may well be justifiable to hold certain objectives to be essential to a 
course of study, and mastery learning may represent an effective means of 
ensuring that nearly all students have attained these objectives. 

The best evidence for the "curricular focus" claim would come from stu- 
dies in which curriculum experts formulated a common set of objectives to be 
pursued equally by mastery learning and control teachers within an equal 
amount of allocated time. If achievement on the criterion- referenced ass- 
essments were higher in mastery learning than in control classes then we 
could at least make the argument that the mastery learning students have 
learned more of the essential objectives, even though the control group may 
have learned additional, presumably less essential concepts. 
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3« Mastery learning is an effective use of additional time and 

instructional resources to bring almost all students to an accepta- 
ble level of achievement (the "extra time" claim). 

A second '*weak claim" would be that given the availability of additional 
teacher and student time for corrective instruction* mastery learning is an 
effective means of ensuring all students a minimal level of achievement. As 
noted earlier* in an extreme form this "extra time" claim is almost axiomat- 
ically true. Leaving aside cases of serious learning disabilities* it 
should certainly be possible to ensure that virtually all students can 
achieve a minimal set of objectives in a new course if an indefinite amount 
of one-to-one tutoring is available to students who initially fail to pass 
formative tests. However* it may be that even within the context of the 
practicable* providing students with additional instruction if they need it 
will bring almost all to a reasonable level of achievement. 

The reason that this is characterized here as a *Vreak claim" is that it 
begs the question of whether the additional time used for corrective 
instruction is the best use of additional time. What could the control 
classes do if they also had more instructional tim However* the "extra 
time" issue is not a trivial one* as it is not impossible to routinely pro-* 
vide corrective instruction to students who need it outside of regular class 
time. For example* this might be an effective use of Chapter I or special 
education resource pull-outs* a possibility that is discussed la^er. 

The best evidence for this claim would come from studies which provided 
mastery learning classes with additional time for corrective instruction and 
used achievement tests that covered all topics which could have been studied 
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by the fastest-paced classes (e.g., standardized tests). However, such stu- 
dies are not knovn to exist: the best existing evidence for the "extra time" 
claim is from studies which used experimenter^made achievement measures and 
provided corrective instruction outside of clasj time. 

Evid ence for the "Strong Claim" 

Table 1 summarizes the major characteristics and findings of seven mas- 
tery learning studies which met the inclusion criteria discussed earlier, 
provided equal time for experimental and control classes, and used standard- 
ized measures of achievement. 



Table 1 Here 



Table 1 clearly indicates that the effects of mastery learning on stand- 
ardized schievement measures are extremely small, at best. The median 
effect size across all seven studies is essentially zero (ES = + .04). The 
only study with a non-trivia^ effect size (ES=+.25), a semester-long experi- 
ment in inner-city Chicago elementary schools by Katims, Smith, Steele, & 
Wick (1977), also had a serious design flaw. Teachers were allowed to 
select themselves into mastery learning or control treatments or were 
assigned to conditions by their principals. It is entirely possible that 
the teachers who were most interested in using the new methods and materi- 
als, or those who were named their principals to use the new program, 
were better teachers than were the control teachers. In any case, the dif- 
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ferences were not statistically sign, ficant when analyzed at the class 
level, were only marginally significant (p=.07l) for individual-level gains, 
and amounted to an experimental-control difference of only HZ of a grade 
equivalent. 

The Katims et al. (1977) study used a cpecially developed set of materi- 
als and procedures which became known as the Chicago Mastery Learning Read- 
ing program, or CMLR. This program provides teachers with specific instruc- 
tional guides* worksheets, formative tests* corrective activities, and 
extension materials. A s**vOnd study of CMLR by Jones, Monsaas, & Katims 
(1979) compared matched CMLR and control schools over a full year. Thfs 
study found a difference between CMLR and control students on the Iowa Test 
of Basic Skills Reading Comprehension scale that was marginally significant 
at the individual level but quite small (ES=+.09). In contrast, on experi- 
menter-made "end of cycle" tests the mastery learning classes did signifi- 
cantly exceed control (ES=+.18). A third study of CMLR by Katims and Jones 
(1985) did not qualify for inclusion in Table 1 because it compared year-to- 
year gains in grade equivalents rather than comparing experimental to con- 
trol groups. However, it is interesting to note that the difference in 
achievement gains between the cohort of students who used the CMLR progran 
and those in the previous year who did not was only 0.16 grade equivalents, 
which is similar to the results found in the Katims et al. (1977) and Jones 
et al. (1979) experimental-control comparisons. 

One of the most important studies of mastery learning is the year-long 
Anderson, Scott, and Hutlock (1976) experiment briefly described earlier. 
This study compared students in grades 1-6 in one mastery learning and one 



control school in Lorain. Ohio. The school populations were similar, but 
there were significant pretest differences at the first and fourth grade 
levels favoring the control group. To ensure initial equality in this non- 
randomized designs students were individually matched on the Metropolitan 
Readiness Test (grades 1-3) or the Otis-Lennon Intelligence Test (grades 
4-6). In the mastery learning school, students experienced the form of mas- 
tery learning described by Block and Anderson (1975). The teacher presented 
a lesson to the class and then assessed student progress on specific objec- 
tives. "Errors ... were remediated through the use of both large-group and 
small-group re-learning and review sessions. After every student had 
demonstrated mastery on the formative test for each unit, the class moved on 
to the next unit" (Anderson et al,, 1976, p. A). 

One particularly importer t aspect of the Anderson et al. (1976) study is 
that it used both standardized tests and experimenter-made, criterion-refer^ 
enced tests. The standardized tests were the Ccmputations, Concepts, and 
Problem Solving scales of the California Achievement Test. The experimen- 
ter-made test was constructed by the project director (Nicholas Hutlock) to 

4 

match the objectives taught in the mastery learning classes. Control teach- 
ers were asked to examine the list of objectives and identify any they did 
not teach, and these were eliminated from the test. 

The results of the study were completely different for the two types of 
achievement vests. On the experimenter-made tests, students in the mastery 
learning rlasses achieved significantly more than did their matched counter- 
parts at every grade level (mean ES=+.64). A retention test based on the 
same objectives was given three months after the end of the intervention 
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period, and mastery learning classes still significantly exceeded control 
(BS=-»-.49). However, on the standardized tests, these differences were not 
registered. Ffastery learning students scored somewhat higher than control 
on Computations (ES=+.17) and Problem Solving (ES=+.07), but the control 
group scored higher on Concepts (ES=-.12). 

The Anderson et al. (1976) finding of marked differences in effects on 
standardized and experimenter- made measures counsels great caution in inter- 
preting results of other studies which used experimenter^made measures only* 
In a year^long study of mathematics, it is highly unlikely that a standard- 
ized mathematics test would fail to register any meaningful treatment 
effect. ieref ore, it must be assume/i that the strong positive effects 
found by Anderson et al. (1976) on the experimenter^made tests are mostly or 
entirely due to the fact that these tests were keyed to the mastery learning 
classes' objectives. It may be that the control classes covered more objec- 
tives than the mastery learning classes, and that learning of these addi- 
tional objectives was registered on the standardized but not the experimen- 
ter^made measures. 

Another important ctudy of mastery learning at the elementary level is a 
dissertation by Kersh (1971), in which eleven fifth-grade classes were ran- 
domly assigned to mastery learning or control conditions for an entire 
school year. Two schools were involved, one middle-class and one lower- 
class. Students' math achievement was assessed about once each month in the 
mastery learning classes, and peer tutoring, games, and other alternative 
activities were provided to students who did not show evidence of mastery. 
Control classes were untreated. The study results did not favor either 



treatment overall on the Stanford Achievement Test's Concepts and Applica- 
tions scales. Individual-level effect sizes could not be computed, as only 
class-level means and standard deviations were reported. Hovever, class- 
level effect sizes were essentially zero in any case (ES=-.06). On an 
experiment er^made criterion-referenced test not specifically keyed to the 
mastery objectives the results were no more conclusive; class-lev^l effects 
slightly favored the control group (ES=-.20). Effects somewhat favored mas- 
tery learning in the lower-class school and favored the control group in the 
middle-class school, but since none of the differences approached statisti- 
cal significance these trends may just reflect teacher effects en raiidom 
variation* 

In a recent study by Gutkin (1985), Al first-grade classes in New York 
City were randomly assigned to mastery learning or control treatments. The 
article does not describe the mastery learning treatment in detail, except 
to note that monthly formative tests were given to assess student progress 
through prescribed instructional units. The mastery learning training also 
included information on classroom management skills, process-product 
research, and performance-based teacher education, and teachers received 
extensive coaching, routine feedback from teacher trainers, and scoring ser^ 
vices for formative and summative tests. After one year, mastery leaming- 
control-dif ferences did not approach statistical significance in Total Read- 
ing on the California Achievement Test (ES=+.12). However, effects were 
more positive on a Phonics subscale (ES=+.36) than on Reading Vocabulary 
(ES=+.04) or Reading Comprehension (ES=+.15). Phonics, with its easily mea- 
surable objectives, may lend itself better to the mastery learning approach 
than do reading comprehension or vocabulary. 
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Studies using standardized measures at the secondary level are no more 
supportive of the "strong claim" than are the elementary studies. A 26-veek 
experiment in inner-city* mostly black* Philadelphia junior and senior high 
schools assessed mastery learning in ninth grade "consumer mathematics," a 
course provided for students who do not qualify for Algebra I (Slavin & Kar- 
weit» 1984). IVenty-five teachers were randomly assigned to mastery learn- 
ing or control treatments* both of which used the same books* worksheets* 
and quizzes in the same cycle of activities. How<>^/er* instructional pace 
was not held constant. After each one-week unit (approximately), mastery 
learning classes took a formative test* and then any students who did not 
achieve a score of at least 80% received corrective instruction from the 
teacher while those who did achieve at that level did enrichment activities. 
The formative tests were used as quizzes in the control group, and after 
taking the quizzes the class wt. t on to the next unit. 

Results on a shortened version of the Comprehensive Test of Basic Skills 
Computations and Concepts and Applications scales indicated no differences 
between mastery learning and control treatments (ES=+.02), and no interac- 
tion with pretest level; neither low nor high achievers benefited froa the 
mastery learning model. It is interesting to note that there were two other 
treatment conditions evaluated in this study, a cooperative learning method 
called Student Teams-Achievement Divisions or STAD ^Slavin* 1983) and a com- 
bination of STAD and mastery learning. STAD classes did achieve signifi- 
cantly more than control (ES=+.19), but adding the Ziastety learning compo- 
nent to STAD had little additional achievement effect (ES=+.03) . 
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A five-week study by Chance (1980) compared randomly assigned mastery 
learning and control methods in teaching reading to students in an all- 
black, inner-city New Orleans school. Approximately once each week, stu- 
dents in the mastery learning groups took formative tests on unit objec- 
tives. If they did not achieve at SOX (on three quizzes) or 90% (on one), 
they received tutoring, games, and/or manipulatives to correct their errors 
and had three opportunities to pass. No effects for students at any level 
of prior performance were found on the Gates-McGinitie Comprehension Test, 
However, it may be unrealistic to expect effects on a standardized measure 
after only five weeks. 

Overall, research on the effects of mastery learning on standardized 
achievement test scores provides little support for the "strong claim" that 
holding time and objectives constant, mastery learning will accelerate stu- 
dent achievement. The studies assessing these effects are not perfect; par- 
ticularly when mastery learning is applied on a fairly wide scale in 
depressed inner-city schools, there is reason to question the degree to 
which the model was faithfully implemented. However, most of the studies 
used random assignment of classes or students to treatments, study durations 
approaching a full school year, and measures which registered coverage as 
well as mastery. Not one of the seven studies found effects of mastery 
learning which even reached conventional levels of statistical significance 
(even in individual- level analyses), much less educational significance. If 
group-based mastery learning had strong effects on achievement in such basic 
skills as reading and math, these studies would surely have detected than. 
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Evidence for the "Curricular Focus " Claim 

Table 2 summarizes the principal evidence for the "curricular focus" 
claim, that mastery learning is cn effective means of increasing student 
achievement of specific skills or concepts held to be the critical objec- 
tivds of a course of study. The studies listed in the table are those which 
(in addition to meeting general inclusion criteria) used experimenter-made, 
criterion- referenced measures and apparently provided experimental and con- 
trol classes with equal amounts of instructional time. It is important to 
note that the distinction between the equal-time studies listed in Table 2 
and the unequal-time studies in Table 3 is often cubtle and difficult to 
discriminate, as many authors did not clarify when Dr how corrective 
instruction was delivered or what the control groups were doing during the 
time when mastery learning classes received corrective instruction. 



Table 2 Here 



A total of eight studies met the requirements for inclusion in Table 2. 
Three of these (Anderson et al, 1976; Jones et al., 1979; Kersh. 1970) were 
studies which used both standardized and experimenter^made measures, and 
were therefore also included in Table 1 and discussed earlier. 

All but one (Kersh, 1970) of the studies listed in Table 2 found positive 
effects of mastery learning on achievement of specified objectives, with ' 
five studies falling in an effect sxze range from +.18 to +,27, The overall 

I 
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median effe^ size for the seven studies which used inunediate posttests is 
+.24. However, the studies vary widely in duration, experimental and con- 
trol treatments, and other features, so this median valu*^ should be cau- 
tiously interpreted. 

Fuchs, Tindal, & Fuchs (1985) conducted a small and somewhat unusual 
study of msbtery learning in rural first-grade reading classes. Students in 
four classes were randomly rissigned to one of two treatments* In the mas- 
tery learning classes, students were tested on oral reading passages in 
their reading groups each week. The whole reading group reviewed each pas- 
sage until at least 80% of the studencs could read the passage correctly at 
50 words per minute. The control treatment was held to be the form of "mas- 
tery learning" recommended by basal publishers. These students were given 
unit tep*:s every 4-6 weeks, jut all students went on to the next unit 
regardless of score. Surprisingly, the measure on which mastery learning 
classes exceeded control was "end-of-book" tests provided with the basal s 
(ES=+.35), not passage reading scores which should have been more closely 
related to the mastery learning procedures (ES=+.05). On both measures it 
was found that while low achievers benefited from the mastery learning 
approach, high achievers generally achieved more in the control classes. 
Since the control teachers were presumably directing their efforts toward 
the objectives assessed in the end-of-book tests to the same degree as the 
mastery learning teachers, the results on this measure are probably fair 
measures of achievement. However, the Fuchs et al. (1985) study may be more 
a study of the effects of repeated reading than of mastery learning per se. 
Research on repeated reading (e.g., Dahl, 1979) has /ound this practice to 
increase comprehension of text. 
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Another small md unusual study at the elementary level was reported by 
Wyckoff (1974), vho randomly aaalgned four sixth grade classes to experimen- 
tal or control conditions for a nine-week anthropology unit, Folloving 
teaching of each major objective, students were quizzed. If the class 
median was at least 70% correct, the class moved on to the next objective; 
otherwise, those who scored less than 70% received peer tutoring or were 
given additional reading or exercises. The control groups used precisely 
the same materials, tests, and schedule. The achievement results were not 
statistically significant, but they favored the mastery learning classes 
(ES=+,24). However, this trend was entirely due to effects on low perform- 
ing readers (ES=+.58), not high-ability readers (ES=+.03), 

One remarkable study spanning grades 3, 6, and 8 was reported in a dis- 
sertation by Cabezon (1984). The author, the director of the National Cen- 
ter for Curriculum Development in Chile, was charged with implementation of 
mastery learning throughout thrt country. Forty-one elementary schools 
throughout Chile were selected to serve as pilots, and an additional 2,143 
schools began using mastery learning two years later. Three years after the 
pilots had begun, Cabezon randomly selected a sample of schools that had 
been using mastery learning for three years, for one year, or not at all. 
Within each selected school two classes at the third, sixth, and eighth 
grade level were selected. 

The form of mastery learning used was not clearly specified, but teachers 
were <»rp:.i. :ic to assess student progress every 2-3 weeks and to provide cor- 
rective instruction to those who needed it. Two subjects were involved, 
Spanish and mathcnatics. 
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Unfortunately, the classes that had used mastery learning for three years 
were found to be much higher in socioeconomic status and mean IQ level than 
were control classes. Because of this problem these comparisons did not 
meet the inclusion criteria* However, the classes that had used mastery 
learning for one year were comparable to the control classes in SES and only 
slightly higher in IQ. 

The study results, summarized in Tibbie 2, indicated stronger effects of 
mastery learning in Spanish than in math, and stror^er effects in the early 
grades than in later ones, with an overall mean of +0.27. However, while 
all teachers useJ the same books, it is unclear to what degree control 
teacl'ers were held to or even aware o^ the objectives being pursued by the 
mastery learning schools. 

Two studies at the secondary level assessed both immediate and long-term 
impacts of mastery learning. One was a study by Lueckemeyer and Chiappetta 
(1981), who randomly assigned tenth graders to six mastery learning or six 
control classes for a six-week human physiology unit. In the mastery learn- 
ing classes, students were given a formative test every two weeks. They 
were then given two days to complete corrective activities for any objec- 
tives on rfhich they did not achieve an 80% score, following which they took 
a second form of the test, which was used for grading purposes. Students 
who achieve the 80% criterion on the first test were given material to read 
or games to play while jir classmates received corrective instruction. 
The control group studied the same material and took the same tests, but did 
not receive the two-day corrective sessions. The control teachers were 
asked to complete the three two-week units in six weeks, lut were not held 
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to the same schedule as the mastery clasi^es* In order to have time to fit 
in the two days for corrective instruction every two weeks, the mastery 
learning classes "had to condense instruction* « • and to guard carefully 
against any «rasted time (C.L. Lueckemeyer. personal communication, November 
4, 1986). 

On an immediate posttest the mastery learning classes achieved signifi- 
cantly more than the control group (ES=-*-«39;, but on a retention test given 
four weeks later the difference had disappeared. The study's authors 
reported the statistically significant effects on posttest achievement but 
noted that "it is questionnable whether such a limited effect on achievement 
is worth the considerable time required for the development and manageme4it 
of such an instructional pto^rram" (Lueckemeyer & Chiappetta, 1981, p. 273). 
Further, it \^ unclear whether the control groups required the full six 
weeks to cover the material. Any additional information students in the 
control group learned (or could h^ve learned) would of course not have been 
registered on the experiment er-r^ad3 tes... 

In a 15-week ji^peric 'nt 'n ninth grade chemistry and physics classes by 
Dunkelberger and Heikkinen (1984) » students were randomly assigned to mas- 
tery learning or control cla&scs. In the mastery learning classes students 
had several chances to meet an 80% criterion on parallel formative tests. 
Control students took the test£ once and received feedback on their arebj of 
strength and weakness. All stuc nts, control as well as experimental, had 
the same corrective activities available during a regularly scheduled free 
time. However, mastery learning students took much greater advantage of 
these activities. The total time used by the experimental group was thus 



greater than that used by control stud^ntd. but since the total time availa* 
ble was held constant, this was categorized as an equal-time study. 

For reasons that were not stated, the implementation of the 15-week chem- 
istry and physics unit was concluded in January, but the posttests were not 
given until June, 4 months later. For this reason the program's effects are 
listed as retention measures only. Effects favored the mastery learning 
classes (ES=+.26). 

Overall, the effects summarized in Table 2 could be interpreted as sup- 
porting the "curricular focus" claim. The effects of mastery learning on 
experimenter-made, criterion-referenced measures are moderate but consis- 
tently positive. The only study with an effect siz? above a modest 0.27 was 
the Anderson et al. (1976) study, in which the experimenter-made measures 
were specifically keyed to the material studied only by the mastery learning 
classes. IVo studies found that the effects of mastery learning were great- 
est for low ach^' ers, as would be expected from mastery learning theory. 

Howesrer, the meaning of the results summarized in Table 2 is far from 
clear. The near-zero effects of mastery learning on standardized measures 
(Table 1) and in particular the dramatically different results for standard- 
ized and experimenter-made measures reported by Anderson et al. (1976) sug- 
gest that the effects of mastery learning on experimenter-made measures 
result from a shifting of instructional focus to a particular set of objec- 
tives neither more nor less valuable than those pursued by the control 
group. Unfortunately, it is impossible to determine from reports of mastery 
learning studies the degree to which control teachers were focusing on the 
objectives assessed on the experimenter-made measures, yet understanding 
this is crucial to understanding the effects reported in these studies. 



Evidence for the Extra- Time Claim 



The problem of unequal time for experimental and control groups is a ser- 
ious one in mastery learning research in general, but the inclusion criteria 
used in the present review have the effect of eliminating the studi^j in 
which time differences are extreme. Mastery learning studies in which 
experimental classes receive considerably more instructional time than con- 
trol classes are always either very brief* rarely more than a week (e.g.* 
Anderson* 1975a, b; Arlin & Webster, 1984), or they involve individualized 
or self-paced rather than group-paced instruction (e.g., Jones, 1974; Wen- 
tling, 1973). In studies of group-paced instruction conducted over periods 
of at least four weeks, extra time for corrective instruction rarely amounts 
to more than 20-25Z of original time. It might be argued that additional 
instructional time of this magnitude might be a practicable means of ensur- 
ing all students a reasonable level of achievement, and the costs of such an 
apporach might not be far out of line with the costs of current compensatory 
or special education. 



Table 3 Here 



Table 3 summarizes the characteristics and outcomes of group-based mas- 
tery learning studies in which the mastery learning classes received extra 
time for corrective instruction. All four of the studies in this category 
took place at the secondary level, grades 7-10. Also, these studies are 
distinctly shorter (5-6 weeks) than were most of the studies listed in 
Tables 1 and 2. 



The median effect size for immedjate posttests from the five compariaone 
in four atudiea ia ^.31, but none of three retention meaaurea found aignifi- 
cant differencea (median RS = -.03} • However, the four atudiea differ 
markedly in experimental proceduresp so these medians have little meaning. 

The importance of the different approaches taken in aifferent studies is 
clearly illustrated in a study by Long, Okey. & Yeany (1978) • In thia 
study, eighth graders were randomly assigned to six classes^ all of which 
studied the same earth science units on the same schedule. TVo classes 
experienced a mastery learning treatment with teacher-directed remediation. 
After every two class periods, students in this treatment took a diagnostic 
progress test. The teacher assigned students specific reaedial work, and 
then gave a second progress test. If students still did not achieve at a 
designated level (the mastery criterion was not described in the article)* 
the teacher tutored them individually. In a second treatment condition, 
student-directed remediation, students received the same instruction and 
tests and had the same corrective materials available, but they were asked 
to use their test results to guide their own learning, rather than having 
specific activities assigned. These students did not take the second pro- 
gress test and did not receive tutoring. Students in the third treatment, 
control, studied the same materials on the same schedule but did not take 
diagnostic progress tests. Teachers rotated across the three treatments to 
minimize possible teacher effects. 

The results of the Long et al. (1978) stu<ty indicated that the teacher- 
directed remediation (mastery learning) group did achieve considerably more 
than the control group (ES=+.43), but exceeded the student-directed renedia- 
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tion group lo a much smaller degree (ES=+.19). What this suggests is that 
simply receiving frequofit and immediate feedback on performance may account 
for a substantial portion of the mastery learning effect. A replication by 
the same authors (Long, Okey, & Yeany, 1981) failed to meet the inclusion 
criteria because it had only one class per treatment. However, it is inter- 
esting to note thai the replication found the same pattern of effects as the 
earlier Long et al. (1978) study; the teacher-directed remediation treatment 
had only slightly more positive effects on student achievement than the stu- 
dent-directed remediation treatment* but both exceeded the control group. 

The Long et al. (1978) study included a retention test, which indicated 
that whatever effects existed at the end of the implementation period had 
disappeared twelve weeks later. Retention is especially important in stu- 
dies in which corrective instruction is given outside of class time, as any 
determination of the cost-ef f eCtiveness of additional time should take into 
account th<- lasting iitpact ©f the expenditure. 

Another extra-time study which assessed retention outcomes was a disser- 
tation by Kagan (1975), who randomly assigned four teachers and their seven- 
teen seventh-grade classes to mastery learning or control treatments. The 
mastery learning treatment essentially followed the sequence suggested by 
Block and Anderson (1975). Students were quizzed at the end of each week, 
and teachers worked with students who failed to reach an 80% criterion, 
after which students took a second formative test. The control classes used 
the same materials and procedures except that they took the formative tests 
as quizzes. Teachers scored the quizzes, returned them to students, and 
then went on to the next unit. The leacherj followed the same sequence of 
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activities, but were allowed to proceed at their own pace. As a result, the 
mastery learning classes took 25 days to complete the five units on "trans- 
portation and the environment" while control classes took only 20-21 days. 

Unfortunately, there were pretest differences favoring the control 
classes of approximately 40% of a grade equivalent on Iowa Test of Basic 
Skills vocabulary scores. Analyses of covariance on the posttests found no 
experimental-control differences; in fact, adjusted scores slightly favored 
the control group (ES=-.ll). On a fout-we*»k retention measure the control 
group's advantage was slightly greater (ES=-.15). When experimental treat- 
ments vary widely in pretests or covariates. statistical adjustments tend to 
undei^adjust (see Reichardt. 1979). so these results must be interpreted 
with caution. However, even disccrding the results for the one control 
teacher whose classes had high pretest scores, differences still favored the 
control group on the posttest (ES=-.17) and on the retention test (ES=-.23). 

A small study by Hecht (1980) compared mastery learning to control treat- 
ments in tenth grade geometry. Students were randomly assigned to treat- 
ments, and each of two teachers taught mastery learning as well as control 
classes. In the mastery learning classes students were given formative 
tests every two weeks, which were followed by "intensive remedial help for 
those who needed it" (mastery criteria and corrective activities were not 
stated). Results on an experimentet-made test favored the mastery learning 
classes (ES=+.3l). 

Th largest effect sizes by far for any of the studies which met the 
inclusion criteria were found in a dissertation by Mevarech (1980). In this 
study, students were randomly assigned to eight Algebra I classes in a 2x2 
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factorial deaign* One factor vas "algorithmic" vs. "heuristic" instruc- 
tional strategiM» The "algorithmic" treatments emphasized step-by-step 
solutions of algebraic problems* focusing on lower cognitive skills. The 
"heuristic" treatments emphasized problem solving strategies such as Polya's 
(1957) "understanding-planning-carrying out the plan- evaluating" cycle, and 
focused on higher cognitive skills. 

The other factor was mastery leaniing (feedback-correctives) versus non- 
mastery. In the mastery learning treatments, students were given formative 
tests every two weeks. They then had three chances to meet the mastery cri- 
terion of 80% correct. Corrective instruction included group instruction by 
the teacher and/or the researcher herself; peer tutoring; and tutoring out- 
side of class time by teachers and the researcher. The amount of additional 

time allocated to provide this corrective instruction Is not stated, but the 

/ 

author claimed the amount of out-of-class tutoring to be small (Z. Mevarech, 
personal communication, March 16, 1984). In the non-mastery treatments, 
students studied the same materials and took the formative tests as quizzes. 
To hold the different classes to the same schedule, non-mastery classes were 
given additional problems to work while mastery learning classes were 
receiving corrective instruction. 

The relevant comparisions for the present review involve the mastery 
learning vs. non-mastery factor. Within the algorithmic classes, the mas- 
tery learning classes exceeded non-mastery on both "lower mental process" 
items (i.e., algorithms) (ES=+.30) and on "higher mental process" items 
(ES=+.77). Within the heuristic classes, the effects were even greater for 
both "lower mental process" (ES=+.66) and "higher mental process" itens 
(ES=+.90). 
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Overall, the evidence for the "extra time" claim is unclear. Effect 
sizes for the small number of unequal time studies summarized in Table 3 are 
no more positive than were those reported for other studies using experimen- 
tei^-made measures (Table 2), in which mastery learning classes did not 
receive additional time. In fact, both of the unequal time studies which 
assessed retention found that any effects observed at posttest disappeared 
as soon as four weeks later. Substantial achievement effects of extra time 
for corrective instruction appear to depend on provisions of substantial 
amounts of extra time, well in excess of 20-25%. However, studies in which 
large amounts of additional time are provided to the mastery learning 
classes either involved continuous-progress forms of mastery learning or are 
extremely brief and artificial. What is needed are long-term evaluations of 
mastery learning models in which corrective instruction is given outside of 
class time, preferably using standardized measures and/or criterion-ref ei^- 
enced measures which register all objectives covered by all ^.lasses. 

Retention 

A total of six comparisons in five studies assessed retention of achieve- 
ment effects over periods of 4-12 weeks. All six used experimenter-made 
measures. The median effect size overall is essentially zero, with the 
largest retention effect (ES = +.49) appearing in the Anderson et al. (1976) 
study which found no differences on standardized measures. 
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Discussion 



The best evidence from evaluations of practical applications of group- 
based mastery learning indicates that effects of these methods are moderate 
at best on experimenter-made achievement measures closely tied to the objec- 
tives taught in the mastery learning classes, and are essentially nil on 
standardized achievement measures* These findings may be interpreted as 
supporting the 'H/eak claim'' that mastery learning can be an effective means 
of holding teachers and students to a specified set of instructional objec* 
tives» but do not support the "strong claim" that mastery learning is more 
effective than traditional instruction given equal time and fair achievement 
measures. Further, even this "curricular focus" claim is undermined by unc- 
ertainties about the degree to which control teachers were trying to achieve 
the same objectives as the mastery learning teachers and by a failure to 
show effects of mastery learning on retention measures* 

These conclusions are radically different from those drawn by earlier 
reviewers and meta- analysts. Not only would a mean effect size across the 
sixteen studies emphasized in this review come nowhere near the mean of 
around 1.0 claimed by Bloom (1984a, b), ^uskey & Gates (1985), Lysakowski & 
Walberg (1982), or Walberg (1984), but no single study even approached this 
level. Onl^ one of the sixteen studies had mean effect sizes in excess of 
the 0.52 mean estimated by Kulik et al. (1986) for pre-college studies of 
mastery testing. How can this gross discrepancy be reconciled? 

First, these different reviews focus on very different sets of studies. 
Almost all of the studies cited in this review would have qualified for 
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inclusion in any of the meta-analyses, but the reverse is not true. For 
example, of 25 elementary and secondary studies cited by Guskey and Gates 
(1985), only six qualified for inclusion in the present review. Of 19 such 
studies cited by Kulik et al. (1986/, only four qualified for inclusion in 
the present review. Only two studies, Lueckemeyer & Chiappetta (1981) and 
Slavin & Karweit (1984), appeared in all three syntheses. The list of mas- 
tery learning studies synthesized by Lysakowski and Walberg (1982) is short 
and ideosyncratic, hardly overlapping at all with any of the other reviews, 
and Bloom's (1984) article only discusses a few University of Chicago dis- 
sertations. 

As noted earlier, the principal reason that studies cited elsewhere were 
excluded in the present paper is that they did not meet the four^week dura- 
tion requirement. The rationale for this restriction is that this review 
focuses on the effects of mastery learning in practice , not in theory. It 
would be difficult to maintain that a two- or three-week study could produce 
information more relevant to classroom practice than a semester- or year^ 
long study, partly because artificial arrangements possible in a brief study 
could not be maintained over a longer period. Actually, even four weeks 
could be seen as too short a period for external validity. For example, in 
thii study with by far the largest mean effect size in the current review 
(Mevarech, 1980), the author was involved daily with the mastery learning 
classes, providing in-class assistance, corrective instruction, and indivi- 
dual tutoring. A few classes might have comparable resources at their dis- 
posal for a few weeks, but svcli an arrangement is unlikely to be feasible 
with many classes over a longer period. Had the duration requirenent been 
set at only eight weeks, the maximum effect size for all studies in the pre- 
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sent review would have been +0.26 (excluding the experimenter-made s<saw. 
in the Anderson et al. (1976) study). 

In addition to excluding many studies cited elsewhere, the present review 
included many studies missed in the meta-analyses. These are primarily dis- 
sertations and unpublished papers (mostly \ERA papers) » which comprise 
twelve of the sixteen studies emphasized in this review. Including unpub- 
lished studies is critical in any literature review* as they are less likely 
to suffer from "publication bias," the tendency for studies reporting non- 
significant or negative results not to be submitted to or accepted by jour- 
nals (see Rosenthal, 1979; Bangert-Drowns, 1986). Other differences in 
study selection and computation of effect size between the present paper and 
earlier reviews are important in specific cases. For example, Guskey & 
Gates (1985) report effect sizes for the Jones, Monsaas, & Katims (1978) 
study of +.41 for an experimenter-made measure and +.33 for a standardized 
test, while the present review estimated effect sizes of +.18 and +.09, 
respectively. The difference is that in the present review pretest differ- 
ences (in this case favoring the experimental group) were subtracted from 
the posttest differences. Similarly, Guskey & Gates (1985) report a single 
effect size of +.58 for the Anderson et al. (1976) study, ignoring the 
striking difference in effects on standardized as opposed to experimenter- 
made measures emphasized here. 

There are several important theoretical and practical issues raised by 
the studies of group-based mastery learning reviewed here. These are dis- 
cussed in the following sections^ 
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Why are achievement effects of group-based a ' stery learning so modest ? 
The most striking conclusion if the present review is that other than per- 
haps focusing teachers and students on a narrow set of objectives, group- 
based mastery learning has modest to non-existent effects on student 
achievement in studies of at least four weeks' duration. Given the compell- 
ing nature of the theory on wl mc'Stery learning is based, it is interest- 
ing to speculate on reasons for this. 

One possible explanation is that the corrective instruction provided in 
practical applications of mastery learning ir simply not enough to remediate 
the learning deficits of low achievers. In none of the studies emphasized 
xn this review did corrective instiuction occupy more than one period per 
week, or 20% of all instructional time. This may be enough to ^et students 
up to criterion on very narrowly defined skills, but not enough to ide»itify 
and remjdiate serious deficits, particularly when corrective instruction is 
given in group settings or by peer tutors (as opposed to adult tutors). 
Studies of students' pace through indi^^idualized materials routinexy find 
that the slowest students require 200-600% more time than the fastest stu- 
dents to complete the same amount of material (Arlin & Webster, 1976s Car^ 
roll, 1963; Suppes, 1964), far more than what schools using mastery learning 
are likely to be able to provide for corrective instruction (Arlin, 1982) 

The amount of corrective instruction given in practical applications of 
group-based mastery learning may be not only too little, but also too late. 
It may be that one or two weeks is too long to wait to correct students* 
learning errors; if each day^s learning is a prerequisite for the next day's 
lesson, then perhaps detection and remediation of failures to master indivi- 



dual skills needs to be done daily to be effective. Further, in most appli- 
c*» ons of mastery learningp students may have years of accumulated learning 
deficits that one day per week of corrective instruction is unlikely to 
remediate. 

Time for corrective instruction in group-based mastery learning is pur- 
chased at a cost in termj; of slowing instructional pace. If this time does 
not produce a subrtantial impact on the achievement of large numbers of Jtu- 
dents, then a widc»spread though small negative impact on the learning of the 
majority will balance a narrow positive impact on the learning of the few 
students whose learning problems are large enough to need corrective 
instruction but small enough to be correctable in one class period per week 
or less. 

However, it may be that the feedback-corrective cycle evaluated in the 
studies reported here is simply insufficient in itself to produce a substan- 
tial improvement in student achievement. As Bloom (1980, 1984b) has noted, 
there are many variables other than feedback-correction that should go into 
an effec ive instructional program. Both the process of learning and the 
process of instruction are so complex that it may be unrealistic to expect 
large effects on broadly-based achievement measures from any one factor; 
instructional quality, adaptation to individual needs, motivation, and 
instructional t7iie may all have to be impa^* at the same time to produce 
such effects (see Slavin, in press). 



Is Mastery Learning a Robin Hood Approach to Instruction ? Several cri- 
tics of mastery learning (e.g,» Arlin, 1984a; Resnick, 1977) hove wondered 
whether mastery learning simply shifts a constant amount of learning from 
high to low achievers. The evidence from the present review is not incon- 
sistent with that view; in several studies positive effects were found for 
low achievers only. In fact» given that overall achievement means are no:, 
greatly improved by group-based mastery learnings the reductions in standard 
deviations routinely seen in studies of these methods and corresponding 
decreases in correlations between pretests and posttests are simply statis- 
tical indicators of a shift in achievement from high to low achievers. How~ 
e^er^ it is probably more accurate to say that group-based mastery learning 
trades coverage for mastery. Because rapid coverage is likely to be of 
greatest benefit to high achievers while high mastery is of greatest benefit 
to low achievers^ resolving the coverage-mastery dilemma as recommended by 
mastexy learning theorists is likely to produce a "Robin Hood" effect as a 
byproduct. 

It is i!aportant to note that the coverage vs. mastery dilemma exists in 
all whole-class^ group-paced instruction^ and the "Robin Hood" effect may be 
produced in traditional instruction. For example^ Arl in and Westbury (1976) 
compared individualized instru'^tion to whole-class instruction^ and found 
that the instructional pace set by the teachers using the whole-class 
approach was equal to that of students in the twenty-third percentile in the 
individualized classes, supporting Dahllof 's (1971) contention that tejchers 
set their instructional pace according to the needs of a "steeling group" of 
students in the tenth to twenty-fifth percentile of the class ability dis- 
tribution. Assuming th.it an instructional pace appropriate for students at 



the twenty-third percisntile is too slov for higher achievers (Barr* 1974; 
1975) » then whole-class instruction in effect holds back high achievers for 
the benefit of low achievers. Group-based mastery learning may thus be 
accentuating a "Robin Hood" tendency already present in the class-paced 
traditional models to which it has been compared. 

The coverage vs. mastery dilemma and the corresponding "Robin Hood" 
effect are problematic only within the context of group-based mastery learn*- 
ing» and (at least in theory) only when instruction time is held constant. 
In continuous-progress or :*ndividualized forms of mastery learning in which 
students can move through material more or less at their own rates* the cov- 
erage-mastery dilemma is much less of a concern (Arlin & Westbury, 1976), 
This does not imply that continuous-progress forms of mastery learning are 
necessarily more effective than group-based form^; individualization solves 
the instructional pace problem but creates new problems* such as the diffi- 
culty of providing adequate direct instruction to students performing at 
many levels (Slavin* 1984b). However* there are examples of continuous-pro- 
gress mastery learning programs which have positive effects on standardized 
achievement tests (see* for example* Cohen* 1977; Cohen & Rodriquez* 1980; 
Slavin* Madden* & Leavey* 1984; Slavin & Karweit* 1985). 

Importance of Frequent * Criterion- Referenced Feedback. Even if we accept 
the 'Sreak claim" that mastery learning is an effective means of holding 
teachers and students to a valuable set of instruction objectives* there is 
still some question of which elements of m&'^tery learning account for its 
effects on experimenter-made* criterion-referenced measures. There is some 
evidence that much of this effect aay be accounted for by frequent testing 
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and feedback to students rather than the entire feedback-corrective cycle. 
!'3lik et al. (1986) report that mastery learning studies which failed to 
control for frequency of testing produced mean effect sizes almost twice 
those associated with studies in which mastery learning and control classes 
were tested with equal frequency. Long et al. (1978) compared mastery 
learning to a condition with the same frequency of testing and found a much 
smaller effect than in a comparison with a control group that did not 
receive tests. Looking across other studies^ the pattern is complicated by 
the fact that most which held testing frequency constant also held the coi>- 
trol groups to a slower pace than they might otherwise have attained. 

Practical Implications . The findings of the present review should not 
necessarily be interpreted as justifying an abandonment of mastery learning* 
either as an instructional practice or as a focus of research. Several 
widely publicized school improvement programs based on mastery learning 
principles have apparently been successful (e.g., Abrams, 1983; Levine & 
Stark, 1982; Menahem & Weismaa, 1985; Robb, 1985), and many effective non- 
mastery-learning instructional strategies incorporate certain eleaents of 
mastery learning — in particular, frequent assessment of student learning 
of well-specified objectives and basing teaching decisions on the results of 
these assessments. Further, the idea that students' specific learning defi- 
cits should be remediated immediately instead of being allowed to accumulate 
into large and general deficiencies makes a great deal of sense. It may be 
that more positive results are obtained in continuous-progress forms of mas- 
tery learning, in which students work at their own levels and rates. Use of 
Chapter I, special education, or other resources to provide substantial 
amounts of instructional time to help lower-achieving students keep up with 
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their classmsLes in critical basic skills may also increase stadent achieve- 
ment. This review only concerns the achievement effects of the group-based 
form of mastery learning (Block & Anderson^ 1975) most commonly used in ele- 
mentary and secondary schools. 

The "TWO Sigma Problem" Revisited . One major implication of the present 
review is that the "two-sigma" challenge proposed by Blocm (1984) is proba- 
bly unreal istic» certainly within the context of group-based mastery learn- 
ing. Bloom's claim that mastery leamirg can improve achievement by more 
than one sigma (ES=+1.00) is based on brief, small, artificial studies which 
almost all provided additional instructional time and (in several cases) the 
direct assistance of University of Chicago graduate students to the experi- 
mental classes. In longer-term and larger studies with experimenter-made 
measures, effects of group-based mastery learning are much closer to one- 
quarter signs, and in studies with standardized measures there is no indica- 
tion of any positive effect at all. The two-sigma challenge (or one-sigma 
claim) is misleading out of context and potentially damaging to educational 
research both within and outside of the mastery learning tradition, as it 
may lead researchers to belittle true, replicable, and generalizable 
achievement effects in the more realistic range of 20-50% of an individual- 
level standard deviation. For example, an educational intervention which 
produced a reliable gain of .33 each year could, if applied to lower-class 
schools, wipe out the typical achievement gap between lower- and middle- 
class children in three years — no small accomplishment. Yet the claims 
for huge effects made by Bloom and others could lead researchers who find 
effect sizes of "only" .33 to question the vali of their methods, 
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Clearly, much more research is needed to explore the issues raised in 
this review. More studies of practical, long-term applications of mastery 
learning assessing the effects of these programs on broadly-based, fair mea- 
sures of achievement are especially needed; ideosyncratic features of the 
seven studies which used standardized te.ts preclude any interpretation of 
those studies as evidence that group-based mastery learning is not effec- 
tive. In addition, studies carefully examining instructional pace in mas- 
tery and non-mastery models are needed to shed light on the coverage-mastery 
dilemma discussed here. Mastery learning models in which Ch«q>ter I or other 
remedial teachers provide significant amounts of corrective instruction out- 
side of regular class time might be developed and evaluated, as well as 
models providing daily, brief corrective instruction rather than waiting for 
learning deficits to accumulate over one or more weeks. The disappointing 
findings of the studies discussed in this review counsel not a retreat from 
this area of research but rather a redoubling and redirection of efforts to 
understand how the compelling theories underlying mastery learning can 
achieve their potential in practical application. 

Mastery learning theory and research has made an important contribution 
to t\e study of instructional methods. However, to understand this contri- 
bution it is critical to fully understand the conditions under which mastery 
learning has been studied, the measures that have been used, and other study 
features which bear on the internal and external validity of the findings. 
This best-evidence synthesis has attempted to clarify vhr.t we have learned 
from research on mastery learning in the hope that this knowledge will 
enrich further research anc development in this important area. 
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s<>e Table 1 
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T^ble Z continued 
Equdl-Tiine Studies Using Exper ir.enter-Made Measures 



Effect Sizes 
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o 
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to go cn« Corrective 
activities available 
dur incj free time* 

Control-Used same mtlst 
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Human 
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logy 
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Retent i on 

(4 MkS) 
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Physics 
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Table 3 

Unequal-fime Studies Using Experimenter-Made Measures 



Effect St zes 



Article 

Secondary 

Long 
et d1*ff 

i97a 



Grades Location Si/e tion 



Georg i a 



6 c), 



5 wks* 



I— ' 
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faqan9 
1975 
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Texas 
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17 cl, 
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^ tchrs 
random) y 
assigned to 
HLf control 
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Tchr-Directed ML- 
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Student-Oirected ML- 
Saire for mat i ve 
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but no testSf 
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ML -For mat ive test s 
g 1 ven ever y wk • 
Tchrs dr 1 11 ed 
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sot criteriont 
then Qdve 2nd 
formative test* 

Control -Used save mtl s« 
L procedures as ML* 
Format ive test s 
taken as quizzes* 



Extra b^ gg te n- 

lilSS 5 Group yHe a mj re T ota 1 t ion 



Not Earth Tchr Directed 

Stated Sc I * ML vs* Control: 

Posttest •*^3 
Retention •*08 
{12 wks) 



Tchr Directed 
ML vs. Student- 
Directed ML: 

Posttest • * 19 

''etention -*03 

(12 wks) 



22% Transp* Posttest -*1I 

£ Retention -*l*» 

Environ* <4 wks) 



ERIC 



T»ibfe 3 continued 



Effect Sizes 



£l5iS«i L2£^iiOD 5l£e" t_ion 

Secon dary 



Treatments 



Extra 
T I me 



Subjects 



Group/Measure 



Re ten- 
Total tion 



Hechtf 
1980 



10 



Urban? 
Suburban 
H i dwest 



5 cl< 



6 Mks« 



Hevdr ech t 
1930 



ChicagOff 
mi ddle 
c i ass 
sch« 



8 cU 6 wks« 



Students 
random) y 
ass leaned to 
HL» control 
classes* 
Tuo tchrs 
taught i4L 
fc control 
clas ses* 



S tudent s 
randoml y 
assigned in 
ixZ des I gn to 
**a} gor i thmi c 
strategy" vs« 
"heu r I St I c 
strategy" and to 
PL vs. cont rol • 



ML-Formative tests Not 

Qiven every I Stated 

fe(ks« f fol lowed 

by intens i ve 

reiTiedial help" 
Control -Used saiTie 

mtls C proce- 
dures as ML in- 

cluding both 1st 

L 2nd forma*" I ve 

test but no 

remeJ i al hel p« 

ML-Format I ve tests Not 
giv r^ every 2 Stated 
wk s« St udents 
had 3 ^hances to 
obtain 80% cr i t- 
er ion* Corr • i nst* 
I nc 1 ud«. d qrp« 
inst«9 peer tutor- 
1 ngt adu It tutor ing 
outside of c?:;ci« 
Control -Used same 
Ritls C procedures* 
took ^or mat i ve tests 
as qui zzes j Whi 1e 
ML classes received 
corr« inst«* control 
worked ddd*l problems* 



Geometry 



♦ •31 



Alg. I 



Algor I thmic 
Strategy < 

Heur I s t ic 
Strategy * 



.70 



.83 



.77 



C C 



r 



